1 Introduction

We have one major question regarding data analysis. Specifically, would the data (e.g. the number of clusters) look different if you were to totally exclude “type 2” MD cells? Clearly, these are intermediate cTAL cells probably at the edge of the MD plaque, but definitely not true MD cells. Top markers for type 2 are Egf, Umod, Cldn19 which are known cTAL markers that are known to be NOT expressed in MD cells.

Question : Would the data look different if I exclude “type 2” MD Cells.

“Type 2” MD Cells are intermediate cTal cells that are usually at the edge of the MD plaque. They aren’t TRUE MD Cells.
Identifying these intermediate cTal cells could be done by using markers such as Egf, Umod, Cldn19
These markers are known to not be expressed in Macula Densa Cells

1.1 Version 1

Remove cells labeled “type 2” from the data set.

1.2 Version 2

Remove cells that have expression of Umod > 0 , Egf > 0

2 Loading Required Data/Packages

2.1 Loading Packages

2.2 Loading Dataset

JK_clean_MD.rds = Quality Controlled for Batch Effects / Bad Quality cells.

load(here("jk_code", "JK_clean_MD.rds"))

SO <- SO4

rm(SO4)

3 Past Data of Macula Densa

After viewing the data Type 2 has a clear high expression of the markers compared to other clusters.

DimPlot(SO,group.by = "subclass2_MD")

VlnPlot(SO,c("Egf","Umod","Cldn19"),group.by = "subclass2_MD")
## Warning: The `slot` argument of `FetchData()` is deprecated as of SeuratObject 5.0.0.
## ℹ Please use the `layer` argument instead.
## ℹ The deprecated feature was likely used in the Seurat package.
##   Please report the issue at <https://github.com/satijalab/seurat/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: `PackageCheck()` was deprecated in SeuratObject 5.0.0.
## ℹ Please use `rlang::check_installed()` instead.
## ℹ The deprecated feature was likely used in the Seurat package.
##   Please report the issue at <https://github.com/satijalab/seurat/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

4 V1- Remove Type 2

4.1 Removing Type 2 From Dataset

Idents(SO) <- SO$subclass2_MD
SO2 <- subset(SO, idents = c("type_2"), invert = TRUE)

DimPlot(SO2)

4.2 Reclustering new data

This may be an overclustered Plot but I wanted to have room for rare/suttle differences

ElbowPlot(SO2)

SO2 <- SCTransform(SO2) %>%
    RunPCA() %>%
    FindNeighbors(dims = 1:15) %>%
    FindClusters(resolution = 0.6) %>%
    RunUMAP(dims = 1:15)
## Running SCTransform on assay: RNA
## Warning: The `slot` argument of `GetAssayData()` is deprecated as of SeuratObject 5.0.0.
## ℹ Please use the `layer` argument instead.
## ℹ The deprecated feature was likely used in the Seurat package.
##   Please report the issue at <https://github.com/satijalab/seurat/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## vst.flavor='v2' set. Using model with fixed slope and excluding poisson genes.
## Calculating cell attributes from input UMI matrix: log_umi
## Variance stabilizing transformation of count matrix of size 14407 by 8813
## Model formula is y ~ log_umi
## Get Negative Binomial regression parameters per gene
## Using 2000 genes, 5000 cells
## There are 2 estimated thetas smaller than 1e-07 - will be set to 1e-07
## Found 242 outliers - those will be ignored in fitting/regularization step
## Second step: Get residuals using fitted parameters for 14407 genes
## Computing corrected count matrix for 14407 genes
## Calculating gene attributes
## Wall clock passed: Time difference of 19.86145 secs
## Determine variable features
## Centering data matrix
## Place corrected count matrix in counts slot
## Warning: The `slot` argument of `SetAssayData()` is deprecated as of SeuratObject 5.0.0.
## ℹ Please use the `layer` argument instead.
## ℹ The deprecated feature was likely used in the Seurat package.
##   Please report the issue at <https://github.com/satijalab/seurat/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: Different cells and/or features from existing assay SCT
## Set default assay to SCT
## PC_ 1 
## Positive:  Pappa2, Mir6236, CT010467.1, Slc12a1, Etnk1, Robo2, Sec14l1, Sfrp1, Wwc2, Col4a4 
##     Aard, Pde10a, Hsp90b1, Wnk1, Sptbn1, Itprid2, Col4a3, Zbtb20, Nos1, Bmp3 
##     Tmem158, Sgms2, Epcam, Tfcp2l1, Acsl4, Itga4, Trim2, Sema5a, Zfand5, Megf9 
## Negative:  Fos, Junb, Jun, Hspa1b, Hspa1a, Btg2, Egr1, Zfp36, Atf3, Ier2 
##     Fosb, Klf6, Socs3, Klf2, Jund, Gadd45g, Dnajb1, Dusp1, Tsc22d1, Rhob 
##     Ubc, H3f3b, Gadd45b, Mt1, Cebpd, Actb, Mt2, H2bc4, H1f2, Klf4 
## PC_ 2 
## Positive:  Mir6236, CT010467.1, Neat1, Malat1, Nme7, Fos, Kcnq1ot1, Slc12a1, Etnk1, Jun 
##     Junb, Lars2, Hnrnpa2b1, Srrm2, Ddx5, Egr1, Pappa2, Pnisr, Col4a4, Atrx 
##     Fosb, Ivns1abp, Paxbp1, Robo2, Zbtb20, Prpf4b, Dst, Wnk1, Btg2, Gls 
## Negative:  Fth1, Ldhb, Ubb, Wfdc2, Ftl1, Car15, Fxyd2, Prdx1, Cd63, Mpc2 
##     Cd9, Tmem213, Eif1, Mdh1, Bsg, Cox8a, Aldoa, Mt1, Ndufa4, Mgst1 
##     Ppp1r1a, Clu, Cox4i1, Ly6a, S100a1, Klk1, Chchd2, Itm2b, Cd81, Cox6c 
## PC_ 3 
## Positive:  Umod, Egf, Mt1, Sostdc1, Ckb, Apoe, Aebp1, Atp1a1, Mt2, Gpx6 
##     Cox6c, Chchd10, Defb1, Neat1, Mfsd4a, Car4, Fgf9, Lpl, CT010467.1, Ivns1abp 
##     Slc5a3, Tmem213, Atp4a, Ptger3, Wfdc15b, Fkbp11, Mgst3, Baiap2l2, Cox5b, Cox8a 
## Negative:  Pappa2, Aard, Tmem158, Mcub, Dctd, Wwc2, Ptgs2, Ptprz1, Iyd, Ramp3 
##     Cd9, Car15, Hsp90b1, S100g, Defb42, Tagln2, Cdkn1c, Ctsc, Bmp2, Btg2 
##     Calm1, Hspa5, Tsc22d1, Nadk2, H3f3b, Lmna, Lcn2, Epcam, Gsn, Pdia6 
## PC_ 4 
## Positive:  Hspa1b, Hspa1a, Jun, Mt1, Mt2, Ptger3, Hspb1, Fos, Ftl1, Id3 
##     Ddit4, Hspa8, Apoe, Aebp1, Cited2, Cd63, H2-D1, CT010467.1, Hoxb6, Gpx6 
##     Hoxb2, Hsp90ab1, Fau, Hsp25-ps1, Zfand5, Egfl6, Car15, Mgll, Irx1, Cyb5a 
## Negative:  S100g, Umod, Egf, Atf3, Tmem52b, Actb, Egr1, Gadd45g, Atp1b1, Ly6a 
##     Socs3, Pth1r, Igfbp7, Rcan1, Cxcl10, Sult1d1, Gadd45b, Klk1, Slc25a30, Wnk1 
##     Spp1, Kng2, Tsc22d1, Krt7, Klf6, Itga6, Fabp3, Nme7, Cox6c, Zfp36 
## PC_ 5 
## Positive:  Umod, Hspa1b, Egf, Klk1, Hspa1a, CT010467.1, Atp1b1, Ly6a, Pappa2, Car15 
##     Wnk1, Kng2, Sfrp1, Wfdc2, Clu, Atp1a1, Iyd, Pth1r, Fth1, Hsp90b1 
##     Mal, Cxcl10, Mir6236, Cldn10, Rnaset2a, Slc12a1, Sult1d1, Pik3r1, Igfbp4, Prdx5 
## Negative:  S100g, Mt1, Actb, Mt2, Egr1, Serf2, Tmsb10, Ppia, Atf3, Ptma 
##     Egfl6, H3f3b, Neat1, Slc24a5, Herpud1, Atpif1, Ranbp3l, Ftl1-ps1, Atp5md, Ier2 
##     Klf6, Efhd1, Alkbh5, Rcan1, Cebpd, Fosb, Sdc4, Zfp36l1, Atp5k, Ldhb-ps
## Computing nearest neighbor graph
## Computing SNN
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 8813
## Number of edges: 265716
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7731
## Number of communities: 11
## Elapsed time: 1 seconds
## Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
## To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
## This message will be shown once per session
## 11:16:12 UMAP embedding parameters a = 0.9922 b = 1.112
## 11:16:12 Read 8813 rows and found 15 numeric columns
## 11:16:12 Using Annoy for neighbor search, n_neighbors = 30
## 11:16:12 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 11:16:13 Writing NN index file to temp file /var/folders/7n/x74qctp91rng390gx0z9hmd80000gn/T//RtmpfFTIiz/fileef4db662723
## 11:16:13 Searching Annoy index using 1 thread, search_k = 3000
## 11:16:15 Annoy recall = 100%
## 11:16:16 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 11:16:17 Initializing from normalized Laplacian + noise (using RSpectra)
## 11:16:17 Commencing optimization for 500 epochs, with 369568 positive edges
## 11:16:17 Using rng type: pcg
## 11:16:24 Optimization finished
DimPlot(SO2)

VlnPlot(SO2,c("Egf","Umod","Cldn19"),group.by = "subclass2_MD")

4.3 Viewing Umod / Egf

DimPlot(SO2)

VlnPlot(SO2,c("Umod","Egf"))

FeaturePlot(SO2,"Umod")

4.3.1 Further QC Removing high expression of Umod/Egf

SO2 <- subset(SO2, idents = 10, invert = TRUE)


SO2 <- SCTransform(SO2) %>%
    RunPCA() %>%
    FindNeighbors(dims = 1:15) %>%
    FindClusters(resolution = 0.6) %>%
    RunUMAP(dims = 1:15)
## Running SCTransform on assay: RNA
## vst.flavor='v2' set. Using model with fixed slope and excluding poisson genes.
## Calculating cell attributes from input UMI matrix: log_umi
## Variance stabilizing transformation of count matrix of size 14407 by 8789
## Model formula is y ~ log_umi
## Get Negative Binomial regression parameters per gene
## Using 2000 genes, 5000 cells
## There are 1 estimated thetas smaller than 1e-07 - will be set to 1e-07
## Found 252 outliers - those will be ignored in fitting/regularization step
## Second step: Get residuals using fitted parameters for 14407 genes
## Computing corrected count matrix for 14407 genes
## Calculating gene attributes
## Wall clock passed: Time difference of 18.69488 secs
## Determine variable features
## Centering data matrix
## Place corrected count matrix in counts slot
## Set default assay to SCT
## PC_ 1 
## Positive:  Pappa2, Mir6236, CT010467.1, Slc12a1, Etnk1, Sfrp1, Aard, Sec14l1, Robo2, Wwc2 
##     Pde10a, Hsp90b1, Col4a4, Wnk1, Sptbn1, Itprid2, Tmem158, Epcam, Col4a3, Zbtb20 
##     Nos1, Bmp3, Car15, Sgms2, Dctd, Trim2, Rnf145, Acsl4, Tfcp2l1, Tmem59 
## Negative:  Fos, Junb, Jun, Hspa1b, Hspa1a, Btg2, Egr1, Zfp36, Ier2, Atf3 
##     Fosb, Socs3, Klf6, Klf2, Jund, Gadd45g, Dnajb1, Dusp1, Rhob, Tsc22d1 
##     Ubc, Gadd45b, H3f3b, Cebpd, Mt2, H1f2, H2bc4, Mt1, Actb, Klf4 
## PC_ 2 
## Positive:  Mir6236, CT010467.1, Neat1, Nme7, Malat1, Kcnq1ot1, Etnk1, Slc12a1, Lars2, Fos 
##     Srrm2, Hnrnpa2b1, Pappa2, Pnisr, Ddx5, Col4a4, Atrx, Robo2, Ivns1abp, Zbtb20 
##     Prpf4b, Jun, Paxbp1, Syne2, Wnk1, Dst, Gls, Junb, Zmiz1, Chka 
## Negative:  Fth1, Ldhb, Ubb, Ftl1, Wfdc2, Car15, Prdx1, Fxyd2, Cd63, Mpc2 
##     Cd9, Eif1, Tmem213, Mdh1, Mt1, Bsg, Aldoa, Cox8a, Ndufa4, Mgst1 
##     Clu, Ppp1r1a, Cox4i1, Ly6a, S100a1, Klk1, Chchd2, Cd81, Itm2b, Cox6c 
## PC_ 3 
## Positive:  Pappa2, Aard, Tmem158, Mcub, Dctd, Wwc2, Ptgs2, Ptprz1, Iyd, Ramp3 
##     Cd9, Hsp90b1, S100g, Car15, Defb42, Tagln2, Cdkn1c, Ctsc, Bmp2, Tsc22d1 
##     Hspa5, Calm1, Nadk2, Lcn2, Btg2, Lmna, Pdia6, Il1f6, H3f3b, Gsn 
## Negative:  Umod, Egf, Sostdc1, Mt1, Ckb, Apoe, Aebp1, Mt2, Atp1a1, Gpx6 
##     Cox6c, Defb1, Chchd10, Neat1, Mfsd4a, Car4, Fgf9, Lpl, Tmem213, Ivns1abp 
##     Atp4a, Slc5a3, Ptger3, CT010467.1, Fkbp11, Wfdc15b, Mgst3, Baiap2l2, Cox5b, Col5a2 
## PC_ 4 
## Positive:  Hspa1b, Hspa1a, Jun, Hspb1, Fos, Id3, Ddit4, Ptger3, Cited2, Hspa8 
##     Ftl1, Car15, Hsp90aa1, Hsp25-ps1, Hoxb2, Ier3, H2-D1, C2cd4b, Irx1, Fau 
##     Apoe, Fth1, Hsp90ab1, Aebp1, Cd63, Gpx6, Mt1, Itm2b, Cyb5a, Bst2 
## Negative:  S100g, Umod, Egf, Atf3, Egr1, Socs3, Actb, Tmem52b, Gadd45g, Gadd45b 
##     Rcan1, Cxcl10, Klf6, Igfbp7, Ly6a, Atp1b1, Zfp36, Tsc22d1, Spp1, Rrad 
##     Slc25a30, Itga6, Pth1r, Fosb, Sult1d1, Cox6c, Krt7, Fabp3, Mir6236, Nme7 
## PC_ 5 
## Positive:  Umod, Egf, Hspa1b, Klk1, Hspa1a, Atp1b1, Ly6a, Wnk1, Kng2, Pth1r 
##     CT010467.1, Sfrp1, Pappa2, Iyd, Clu, Car15, Tmem52b, Atp1a1, Sult1d1, Hsp90b1 
##     Cxcl10, Mal, Slc12a1, Wfdc2, Ier3, Mir6236, Igfbp4, Rnaset2a, Prdx5, Krt7 
## Negative:  Mt1, Mt2, S100g, Actb, Egr1, Serf2, H3f3b, Egfl6, Apoe, Atf3 
##     Ppia, Tmsb10, Ptma, Ier2, Herpud1, Neat1, Klf6, Selenow, Ftl1-ps1, Fosb 
##     Atpif1, Rcan1, Ptger3, Mgll, Gadd45b, Zfp36l1, Ugp2, Car2, Ranbp3l, Slc24a5
## Computing nearest neighbor graph
## Computing SNN
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 8789
## Number of edges: 265255
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7709
## Number of communities: 12
## Elapsed time: 1 seconds
## 11:16:59 UMAP embedding parameters a = 0.9922 b = 1.112
## 11:16:59 Read 8789 rows and found 15 numeric columns
## 11:16:59 Using Annoy for neighbor search, n_neighbors = 30
## 11:16:59 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 11:16:59 Writing NN index file to temp file /var/folders/7n/x74qctp91rng390gx0z9hmd80000gn/T//RtmpfFTIiz/fileef4d1f9f9703
## 11:16:59 Searching Annoy index using 1 thread, search_k = 3000
## 11:17:02 Annoy recall = 100%
## 11:17:02 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 11:17:03 Initializing from normalized Laplacian + noise (using RSpectra)
## 11:17:03 Commencing optimization for 500 epochs, with 368068 positive edges
## 11:17:03 Using rng type: pcg
## 11:17:10 Optimization finished
DimPlot(SO2)

v1markers<- FindAllMarkers(SO2, only.pos = TRUE, min.pct = 0.5, logfc.threshold = 0.5)
## Calculating cluster 0
## For a (much!) faster implementation of the Wilcoxon Rank Sum Test,
## (default method for FindMarkers) please install the presto package
## --------------------------------------------
## install.packages('devtools')
## devtools::install_github('immunogenomics/presto')
## --------------------------------------------
## After installation of presto, Seurat will automatically use the more 
## efficient implementation (no further action necessary).
## This message will be shown once per session
## Calculating cluster 1
## Calculating cluster 2
## Calculating cluster 3
## Calculating cluster 4
## Calculating cluster 5
## Calculating cluster 6
## Calculating cluster 7
## Calculating cluster 8
## Calculating cluster 9
## Calculating cluster 10
## Calculating cluster 11
v1markers %>%
    group_by(cluster) %>%
    dplyr::filter(avg_log2FC > 1, p_val_adj < 0.05) %>%
    slice_head(n = 5) %>%
    ungroup() -> top3

4.3.2 Top 3 Genes unique to each Cluster

Some of these differences may be very minimal and some have no differences meaning they could be grouped as one.

v1markers<- FindAllMarkers(SO2, only.pos = TRUE, min.pct = 0.5, logfc.threshold = 0.5)
## Calculating cluster 0
## Calculating cluster 1
## Calculating cluster 2
## Calculating cluster 3
## Calculating cluster 4
## Calculating cluster 5
## Calculating cluster 6
## Calculating cluster 7
## Calculating cluster 8
## Calculating cluster 9
## Calculating cluster 10
## Calculating cluster 11
v1markers %>%
    group_by(cluster) %>%
    dplyr::filter(avg_log2FC > 1, p_val_adj < 0.05) %>%
    slice_head(n = 3) %>%
    ungroup() -> top3

DoHeatmap(SO2, features = top3$gene) + NoLegend()

4.3.3 Creating Excel File for V1 markers

marker_list <- split(v1markers, v1markers$cluster)


# Sort names alphanumerically

sorted_cluster_names <- names(marker_list)[order(as.numeric(names(marker_list)))]

 

# Create a workbook

wb <- createWorkbook()

 

# Add each cluster as a new worksheet

for (cluster_name in sorted_cluster_names) {

  addWorksheet(wb, sheetName = cluster_name)

  writeData(wb, sheet = cluster_name, marker_list[[cluster_name]])

}

 

date <- format(Sys.Date(), "%Y%m%d")

 

# Save workbook

saveWorkbook(wb, here("jk_code", paste0(date, "_", "_V1_FindAllMarkers_By_Cluster.xlsx")), overwrite = TRUE)

4.4 View by Type

SO2@meta.data <- SO2@meta.data %>%
  mutate(subclass_MD = dplyr::case_when(
    seurat_clusters %in% c(0,1,2,3,4,6,7,8,10) ~ "type_1",
     seurat_clusters %in% c(5,9) ~ "type_2",
    seurat_clusters == 11 ~ "type_3"
  ))
SO2@meta.data$subclass_MD <- factor(
  SO2@meta.data$subclass_MD, 
  levels = c("type_1","type_2", "type_3")
)
DimPlot(SO2,group.by = "subclass_MD")

VlnPlot(SO2,c("Umod","Egf"))

FeaturePlot(SO2,c("Umod","Egf"))

Idents(SO2) <- "subclass_MD"

v1markers_2<- FindAllMarkers(SO2, only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25)
## Calculating cluster type_1
## Calculating cluster type_2
## Calculating cluster type_3
v1markers_2 %>%
    group_by(cluster) %>%
    dplyr::filter(avg_log2FC > 0.7, p_val_adj < 0.05) %>%
    slice_head(n = 5) %>%
    ungroup() -> top5

DoHeatmap(SO2, features = top5$gene) + NoLegend()
## Warning in DoHeatmap(SO2, features = top5$gene): The following features were
## omitted as they were not found in the scale.data slot for the SCT assay:
## Rnaset2b

### Excel File Creation for Type Markers

marker_list <- split(v1markers_2, v1markers_2$cluster)


# Sort names alphanumerically

sorted_cluster_names <- names(marker_list)[order(as.numeric(names(marker_list)))]
## Warning in eval(quote(list(...)), env): NAs introduced by coercion
# Create a workbook

wb <- createWorkbook()

 

# Add each cluster as a new worksheet

for (cluster_name in sorted_cluster_names) {

  addWorksheet(wb, sheetName = cluster_name)

  writeData(wb, sheet = cluster_name, marker_list[[cluster_name]])

}

 

date <- format(Sys.Date(), "%Y%m%d")

 

# Save workbook

saveWorkbook(wb, here("jk_code", paste0(date, "_", "_V1_FindAllMarkers_By_Type.xlsx")), overwrite = TRUE)

5 V2 - Removing Umod/Egf Expression

5.1 Filtering out Umod > 0

VlnPlot(SO,c("Egf","Umod","Cldn19"))

# If I remove too much Umod and Egf, The cell COunt drops way too low

SO3 <- subset(SO, subset = !(Umod > 0))

VlnPlot(SO3,c("Egf","Umod","Cldn19"))
## Warning in SingleExIPlot(type = type, data = data[, x, drop = FALSE], idents =
## idents, : All cells have the same value of Umod.

FeaturePlot(SO3,"Egf")

DimPlot(SO3)

5.2 Reclustering

SO3 <- SCTransform(SO3) %>%
    RunPCA() %>%
    FindNeighbors(dims = 1:15) %>%
    FindClusters(resolution = 1) %>%
    RunUMAP(dims = 1:15)
## Running SCTransform on assay: RNA
## vst.flavor='v2' set. Using model with fixed slope and excluding poisson genes.
## Calculating cell attributes from input UMI matrix: log_umi
## Variance stabilizing transformation of count matrix of size 13408 by 2542
## Model formula is y ~ log_umi
## Get Negative Binomial regression parameters per gene
## Using 2000 genes, 2542 cells
## Found 116 outliers - those will be ignored in fitting/regularization step
## Second step: Get residuals using fitted parameters for 13408 genes
## Computing corrected count matrix for 13408 genes
## Calculating gene attributes
## Wall clock passed: Time difference of 6.626872 secs
## Determine variable features
## Centering data matrix
## Place corrected count matrix in counts slot
## Warning: Different cells and/or features from existing assay SCT
## Set default assay to SCT
## PC_ 1 
## Positive:  Fth1, Ubb, Ldhb, Mt1, Ftl1, Eif1, Wfdc2, Prdx1, Fxyd2, Cd63 
##     Aldoa, Mdh1, Car15, Bsg, Ndufa4, Cox8a, H3f3b, Mpc2, Tmem213, Cox4i1 
##     S100a1, Cd9, Ppp1r1a, Mt2, Cox6c, Tle5, Clu, Jund, Atpif1, Mgst1 
## Negative:  Mir6236, CT010467.1, Neat1, Kcnq1ot1, Nme7, Pappa2, Slc12a1, Etnk1, Lars2, Col4a4 
##     Hnrnpa2b1, Robo2, Atrx, Pnisr, Zbtb20, Dst, Syne2, Itprid2, Ddx5, Wnk1 
##     Sptbn1, Col4a3, Tfcp2l1, Pou3f3, Prpf4b, Utrn, Itga4, Prrc2c, Nfe2l1, Zmiz1 
## PC_ 2 
## Positive:  Fos, Jun, Hspa1b, Hspa1a, Junb, Btg2, Zfp36, Egr1, Klf6, Fosb 
##     Ier2, Socs3, Jund, Atf3, Klf2, Ubc, Rhob, Dusp1, Dnajb1, Gadd45g 
##     Mt2, Mt1, Tsc22d1, Zfp36l1, Gadd45b, Bhlhe40, Cebpd, Actb, Neat1, H3f3b 
## Negative:  Pappa2, Aard, Car15, Cd9, Pth1r, Ldhb, Mpc2, Tmem158, Mgst1, Ly6a 
##     Tmem59, Mcub, Fth1, Clu, Wwc2, Hsp90b1, Bsg, Dctd, Tmbim6, Ramp3 
##     Pdia6, Prdx1, Epcam, Sfrp1, Pdia3, Mdh1, Iyd, Ppp1r1a, Cd81, Rnaset2a 
## PC_ 3 
## Positive:  Pappa2, Aard, Tmem158, Wwc2, Ptgs2, Mcub, Iyd, Cd9, Ptprz1, Dctd 
##     Ramp3, Hsp90b1, Car15, Tsc22d1, Btg2, Hspa5, Ctsc, Cdkn1c, Bmp2, Pdia6 
##     Tagln2, Defb42, H3f3b, Junb, Clu, Egr1, Cebpd, Lmna, Gsn, Fos 
## Negative:  Mt1, Sostdc1, Apoe, Aebp1, Ckb, Gpx6, Mt2, Cox6c, Chchd10, Defb1 
##     Mfsd4a, Atp1a1, Ptger3, Fgf9, Egfl6, Car4, Atp5md, Tmem213, Atp5k, Cox6b1 
##     Lpl, Cox5b, Cox8a, Cox7a2, Ndufa4, Ivns1abp, Uqcrh, Atp5mpl, Fkbp11, Neat1 
## PC_ 4 
## Positive:  Hspa1a, Hspa1b, Klk1, Car15, Sfrp1, CT010467.1, Atp1a1, Clu, Fth1, Jun 
##     Clcnkb, Pik3r1, Ly6a, Wnk1, Cited2, Slc12a1, Rnaset2a, Mal, Atp1b1, Id3 
##     Iyd, Pth1r, Tmem176a, Kng2, Cldn10, Tmem213, Fau, Lpl, Cd63, Robo2 
## Negative:  S100g, Actb, Ppia, Tmsb10, Mt1, Serf2, Ndufb1-ps, Ptma, Atp5md, Atp5e 
##     Atp5k, Atp5mpl, Tpt1, Aard, Tmsb4x, Cox6c, Atpif1, Calm1, Cebpd, Ndufc1 
##     Igfbp7, Klf6, Dbi, Alkbh5, Egr1, S100a11, Ftl1-ps1, Zfp36l1, Tagln2, Pappa2 
## PC_ 5 
## Positive:  Mt2, Mt1, Pappa2, Ptger3, Aard, Begain, Tmsb4x, Apoe, Cd63, CT010467.1 
##     Mgll, Ckb, Eef1a1, Egfl6, H2-D1, Hspa8, Acot1, Ftl1, Wfdc2, Tesc 
##     Bst2, Luzp2, Hspa5, Adarb2, Zfand5, Selenow, Pantr1, Khdrbs2, H2-K1, Hspb1 
## Negative:  S100g, Pth1r, Klk1, Atp1b1, Nme7, Ly6a, Wnk1, Kng2, Iyd, Sgms2 
##     Defb1, Egf, Clu, Kng1, Mpc2, Col4a3, Cebpd, Cox6c, Sfrp1, Ptgs2 
##     Atp5md, Cox7b, Cwh43, Sdc4, Slc25a30, Atp6v1c2, Cox8a, Tbxas1, S100a1, Ddx5
## Computing nearest neighbor graph
## Computing SNN
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 2542
## Number of edges: 86513
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.6358
## Number of communities: 9
## Elapsed time: 0 seconds
## 11:17:59 UMAP embedding parameters a = 0.9922 b = 1.112
## 11:17:59 Read 2542 rows and found 15 numeric columns
## 11:17:59 Using Annoy for neighbor search, n_neighbors = 30
## 11:17:59 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 11:17:59 Writing NN index file to temp file /var/folders/7n/x74qctp91rng390gx0z9hmd80000gn/T//RtmpfFTIiz/fileef4d2b21ec9a
## 11:17:59 Searching Annoy index using 1 thread, search_k = 3000
## 11:17:59 Annoy recall = 100%
## 11:18:00 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 11:18:01 Initializing from normalized Laplacian + noise (using RSpectra)
## 11:18:01 Commencing optimization for 500 epochs, with 103290 positive edges
## 11:18:01 Using rng type: pcg
## 11:18:03 Optimization finished
DimPlot(SO3)

5.2.1 Viewing Expression

VlnPlot(SO3,c("Umod","Egf"))

5.3 Viewing Markers by Cluster

v2markers<- FindAllMarkers(SO3, only.pos = TRUE, min.pct = 0.5, logfc.threshold = 0.5)
## Calculating cluster 0
## Calculating cluster 1
## Warning in FindMarkers.default(object = data.use, cells.1 = cells.1, cells.2 =
## cells.2, : No features pass logfc.threshold threshold; returning empty
## data.frame
## Calculating cluster 2
## Calculating cluster 3
## Calculating cluster 4
## Calculating cluster 5
## Calculating cluster 6
## Calculating cluster 7
## Calculating cluster 8
v2markers %>%
    group_by(cluster) %>%
    dplyr::filter(avg_log2FC > 1, p_val_adj < 0.05) %>%
    slice_head(n = 3) %>%
    ungroup() -> top3v2

DoHeatmap(SO3, features = top3v2$gene) + NoLegend()

VlnPlot(SO3,c("Pappa2","Ptger3","Egr1","Aard"))

5.3.1 Excel File v2 by cluster

marker_list <- split(v2markers, v2markers$cluster)


# Sort names alphanumerically

sorted_cluster_names <- names(marker_list)[order(as.numeric(names(marker_list)))]

 

# Create a workbook

wb <- createWorkbook()

 

# Add each cluster as a new worksheet

for (cluster_name in sorted_cluster_names) {

  addWorksheet(wb, sheetName = cluster_name)

  writeData(wb, sheet = cluster_name, marker_list[[cluster_name]])

}

 

date <- format(Sys.Date(), "%Y%m%d")

 

# Save workbook

saveWorkbook(wb, here("jk_code", paste0(date, "_", "_V2_FindAllMarkers_By_Clustere.xlsx")), overwrite = TRUE)